Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
نویسندگان
چکیده
Responding to the need for semantic lexical resources in natural language processing applications, we examine methods to acquire noun compounds (NCs), e.g., orange juice, together with suitable fine-grained semantic interpretations, e.g., squeezed from, which are directly usable as paraphrases. We employ bootstrapping and web statistics, and utilize the relationship between NCs and paraphrasing patterns to jointly extract NCs and such patterns in multiple alternating iterations. In evaluation, we found that having one compound noun fixed yields both a higher number of semantically interpreted NCs and improved accuracy due to stronger semantic restrictions.
منابع مشابه
Interpreting noun compounds using paraphrases Interpretación de los compuestos nominales mediante paráfrasis
Noun compounds are abundant in English and their interpretation is crucial for many natural language processing tasks. We propose a method for automatic two-noun noun compound interpretation that searches for suitable paraphrases in static corpora and then issues Web search engine queries to validate them. Native speakers were recruited to evaluate the returned paraphrases for noun compounds: t...
متن کاملInterpreting Noun Compounds using Bootstrapping and Sense Collocation
This paper describes a bootstrapping method for automatically tagging noun compounds with their corresponding semantic relations. Our work takes advantage of the collocation of senses of the noun compound constituents and also word similarity. We exploit this to generate a set of noun compounds from a set of previously tagged noun compounds by replacing one constituent of each noun compound wit...
متن کاملWeb-Scale Features for Full-Scale Parsing
Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical af...
متن کاملLinked Open Data and Web Corpus Data for noun compound bracketing
This research provides a comparison of a linked open data resource (DBpedia) and web corpus data resources (Google Web Ngrams and Google Books Ngrams) for noun compound bracketing. Large corpus statistical analysis has often been used for noun compound bracketing, and our goal is to introduce a linked open data (LOD) resource for such task. We show its particularities and its performance on the...
متن کاملStandardised Evaluation of English Noun Compound Interpretation
We present a tagged corpus for English noun compound interpretation and describe the method used to generate them. In order to collect noun compounds, we extracted binary noun compounds (i.e. noun-noun pairs) by looking for sequences of two nouns in the POS tag data of the Wall Street Journal. We then manually filtered out all noun compounds which were incorrectly tagged or included proper noun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011